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Assessing node risk and vulnerability in epidemics on networks 
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Which nodes are most vulnerable to an epidemic spreading through a network, and which carry 
the highest risk of causing a major outbreak if they are the source of the infection? Here we show 
how these questions can be answered to good approximation using the cavity method. Several 
curious properties of node vulnerability and risk are explored: some nodes are more vulnerable than 
others to weaker infections, yet less vulnerable to stronger ones; a node is always more likely to be 
caught in an outbreak than it is to start one, except when the disease has a deterministic lifetime; 
the rank order of node risk depends on the details of the distribution of infectious periods. 


I. INTRODUCTION 


Network structure has a profound influence over dis¬ 
ease dynamics. There is growing acknowledgement of this 
fact in the epidemiology literature (see 0 for a review), 
where the question of how best to quantify and predict 
network effects is becoming a central challenge [2l|. At 
the same time, network epidemic models are of consider¬ 
able interest to the theoretical physics community 0 , 
serving as a canonical example of a non-equilibrium pro¬ 
cess, and providing fertile ground for the application of 
statistical mechanics techniques to new and interesting 
problems. One such technique is the cavity method: in¬ 
vented to tackle problems in statistical inference [2^ and 
condensed matter physics 0, this simple and versatile 
method has found application in areas as diverse as com¬ 
puter science 0] and random matrix theory [23j, [H] . 

The cavity method was first applied to epidemiology in 
[9] (using the alternative moniker of ‘message-passing’), 
to calculate the expected time development of a network 
epidemic. The principle advance provided by the method 
in that work was the ability to model non-Markov epi¬ 
demics, in which the time that a node remains infective 
(the lifetime distribution, or infectious period) is not a 
simple memoryless exponential distribution. Under a sec¬ 
ond alias of ‘belief-propagation’, it has also been applied 
to the problem of tracing the most likely source of a dis¬ 
ease outbreak jlj . This work points to another advantage 
offered by the cavity method; that it is not restricted to 
calculating macroscopic quantities, but can provide infor¬ 
mation about the behaviour of individual nodes. Indeed, 
it is now well-recognised that population heterogeneity 
can play a very important role in disease outbreaks [lj] . 

In this article, we show how to calculate a measure 
of the infection risk that a particular node poses to the 
network, as well as the risk posed by the network to the 
node. These quantities are manifested in two different 
realisations of the cavity method, one with ‘upstream’ 
cavities, the other ‘downstream’. The derivation of the 
cavity equations for the risk and vulnerability measures 
is given in the next section, followed by a discussion of 
their numerical solution. We then address the role of the 
lifetime distribution of the disease, which controls the 
probability of an outbreak occurring, but paradoxically 
has no effect on its final size. Finally, we show how the 
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FIG. 1: Imagining the progress of the disease as a flow through 
the network, © considers the onward spread after removing 
an upstream neighbour (i.e. closer to the source of the in¬ 
fection), whilst © considers the chance of infection with a 
downstream neighbour removed. In both cases the tree ap¬ 
proximation amounts to assuming that the up- and down¬ 
stream neighbours are uniquely defined. 


cavity method can be used to rank nodes according to 
their risk or vulnerability, and how these rankings exhibit 
curious dependencies on the disease dynamics. 


II. CAVITY METHOD FOR RISK AND 
VULNERABILITY 

Consider the spread of a disease over the nodes of a 
network. In a small period of time dt each infected node 
has a chance (3 dt of transmitting the infection to an unin¬ 
fected neighbour, and a chance 7 (t) dt of recovering from 
the disease. Recovered nodes cannot be reinfected. This 
is an SIR epidemic with Poisson rate infections (/3 is a 
positive constant) and general disease lifetime distribu¬ 
tion 7 (t). 

Write r» for the risk that a major outbreak occurs if 
node i is the sole initial infective. Thinking about the 
limit of very large networks, we make an artificial dis¬ 
tinction between the local area around node i, and the 
rest of the network, which we call the ‘bulk’. We will say 
a major outbreak occurs if node i succeeds in transmit¬ 
ting the infection a to non-zero fraction of the bulk. To 
compute an approximation to this quantity, we examine 
the spread of the disease away from the initial infection. 
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We begin by calculating the probability that i infects 
all the vertices in some subset J of its neighbours Af(i), 
and no others. The fates of these nodes are correlated by 
their joint exposure to the random length of the infectious 
period of i: a long infectious period implies that nearly 
all neighbours will catch the disease, and the opposite 
holds if i is infectious only for a very short time. If the 
infectious period has probability density j(t), then we 
may write 

pOO 

p (i^j)= 7 ^) n ( 1 - e_/3t ) n e ~ pt dt 

j£J jeA/-n 

7(t)( 1 - J\ e -mAT{i)\-\J\) dt . 

After i has infected some of its neighbours, the contin¬ 
ued survival of the disease depends on those neighbours 
propagating the infection away from i. Summing over all 
possible collections of infected neighbours we find 

n = l- P(i-t J)(l-rf), (2) 

J<ZAt(i) 

(i) ■ 

where ry is the probability of a major outbreak in a 
network where each j £ J is initially infected, and node i 
has been removed. In words, this equation says that the 
disease starting at i dies out if, whenever i infects a set 
J, the disease starting at J also dies out. 

The task now is to express r^ in terms of the 
If the network contains cycles, then some nodes will be 
reachable from two or more members of J, yet only at 
most one can succeed in passing the infection. This gives 
rise to the inequality 



1 ~r { ] ) > 11(1— r j l) )' (3) 

jeJ 

To compute a lower bound on r* it will therefore suffice 
to assume equality in ©• This approximation is equiva¬ 
lent to assuming that the network is large and tree-like. 
As we will see, the results remain surprisingly accurate 
for networks which do not fit this assumption very well. 
Indeed the “unreasonable effectiveness” of tree approx¬ 
imations appears to be a quite general phenomenon for 
network processes [l 6 |. 

Inserting © and © into © and assuming equality 
we can compute 

poo 

n = l- / 7 (f) n f 1 - (! - er 3t )rf 

Jo j£Af(i) 

=i- e (-D'^n^. 

JcAT(i) j£J 

where 



T n 



- e~ pt ) n dt. 


( 5 ) 


The problem of computing the risk to the network 
posed by node i has thus been transformed in to that 
of computing the risk posed by the neighbours of i in the 
cavity graph. Repeating the same calculations for r ) 
allows us to derive a closed set of equations 

r j l) = 1 - E (~ 1 ) |z ' lT ki II r i 0) > ( 6 ) 

LcAf(j)\i l&L 

These are the upstream cavity equations. The derivation 
requires the additional assumption that the risk posed 
by l in the network with i and j both removed is well- 
approximated by rp), that is, we ignore any alternative 
path between l and i. If we are able to solve these equa¬ 
tions (more on this later), the results can be fed into 
equation (J 3 |) to provide us with an estimate of the prob¬ 
ability of any given node causing a major outbreak. 

We may also ask how vulnerable node i is to an out¬ 
break that starts somewhere else in the network. Write 
Vi for the probability that i is eventually infected in the 
event of a major outbreak. To compute the cavity ap¬ 
proximation for vulnerability, we trace the possible path 
of the disease downstream from the bulk to node i. The 
calculation is somewhat easier in this case, since each 
neighbour of i has an independent chance to attempt to 
infect it. This means that the detail of the distribution 
of infectious periods no longer matters, only the overall 
chance of an infection, Xj (known as the transmissibility). 
The result, previously derived in a , is 

Vi = l- n (l-Tivf), (7) 

jeAT(i) 

where the downstream cavity equations are 

= n (l-TivP), w,j. (8) 

l£Af(j)V 

The distinction between the up- and downstream cav¬ 
ity equations should hopefully be clear: see Figure[l]for a 
cartoon illustration. It should also be noted that there is 
a close relationship between the cavity equations, and the 
equation for survival probability of a multi-type branch¬ 
ing process [2j]. This should not be surprising as the 
cavity equations are derived from considerations of the 
early development of the epidemic, where the branching 
process approximation is known to apply Q. 


III. ITERATION AND PERCOLATION 

Typically, non-trivial solutions to the cavity equations 
in either direction cannot be found by hand. Fortunately, 
for general networks the cavity equations are stable under 
iteration and can be numerically solved to high accuracy 
with little computational cost. To do this efficiently, it is 
helpful to map to a new network encoding the relations 
between edges in the cavity equations. Each undirected 


3 



FIG. 2: Illustration of the non-backtracking network (dark, 
directed network) for a simple four-node network (pale, undi¬ 
rected network). Each edge in the underlying network spawns 
a pair of new nodes, one for each possible direction of infec¬ 
tion, which are linked if they appear in the cavity equations. 


edge (i,j) spawns two nodes, [i —tj) and (j —► i). Be¬ 
tween two nodes e = (i —>• j) and e! = (*' —> j') in the 
new network, we draw a directed edge from e to e! if r(? ^ 

is involved in the cavity equation for r ^, that is, if %' = j 
and j' G Af{j) \ i. 

The adjacency matrix B of this network is known as 
the non-backtracking, or Hashimoto matrix. It has 
recently been shown to be of use in spectral clustering 
fl2l |. and occurs in the calculation of the percolation 
threshold in sparse networks [3, Sj. Using this construc¬ 
tion, the upstream and downstream cavity equations can 
be rewritten as 

r = U{r ) and v = D(v) , (9) 

where U and D are the vector functions 
U e (x) = 1- £ (-I)'*' 

EG M(e) e'GE 

( 10 ) 

D e {x) = 1- ll (1 T\X e ') . 

e'eA/Je) 

Starting from the initial vector x e = 1 for all e, solutions 
to m can be found by repeatedly applying the maps U or 
D. For n > 1, the quantity (1) describes the risk 

that the disease spreads at least distance n from node j 
in the absence of node i. Similarly, £b"_^(l) gives the 
risk that j is infected if all nodes in the cavity network of 
distance greater than n are themselves infected. Clearly 
these quantities are decreasing with n and bounded from 
below by zero, hence the iteration scheme is guaranteed 
to converge. 

For efficient numerical implementation, the integrals 
([5]) should of course be precomputed (indeed this can be 
accomplished analytically whenever the Laplace trans¬ 
form of 7 is known). It is possibly less obvious that often 
a substantial speed-up can be gained by also precom¬ 
puting the explicit form of cavity equations themselves. 


That is, rather than having the computer loop through 
the subsets E C Af(e) during each iteration step, an ex¬ 
plicit iteration function can be procedurally generated 
from (fTUl) before starting the main loop. 

Notice that the upstream and downstream equations 
([9j both admit zero as a solution, since it is always pos¬ 
sible that the disease fails to spread. For some param¬ 
eter values, the zero solution is unstable and the cavity 
equations admit a solution in ( 0 , 1 ) corresponding to dis¬ 
ease outbreak. The regimes of extinction and outbreak 
are separated by a percolation phase transition, and the 
critical parameter values can be determined by examin¬ 
ing the stability of the maps U and D around zero. From 
m we compute 


dUe 

dx e > 


he'£E}(-iy E ^T\ E \ Xe" , 

EGM{e) e"£E\e' 


8D e 

dx e ' 


= Tl\e'GN{e)} (1 - T-iXeft) , 

e"GA/’(e)\e / 


(ii) 


where I is the indicator function giving one for true argu¬ 
ments and zero for false. At the zero fixed point ( x e = 0 
for all e) we find that the Jacobian matrix is the same 
for both systems, 


dU e 


dx e i 


8De 


dXp' 


= 7) II 


e,e' i 


( 12 ) 


where B is the non-backtracking matrix. Major out¬ 
breaks are therefore only possible if T\ > T c = 
l/|A m ax(-B)|, where T c is the critical value for the per¬ 
colation transition. Note that the transition point is the 
same for all nodes, regardless of any heterogeneity in the 
network. As shown in @,@1, the role of A max (-B) is a gen¬ 
eral result for percolation processes on sparse networks. 


IV. THE ROLE LIFETIME DISTRIBUTION 


As briefly mentioned earlier, although the probability 
of a major outbreak occurring depends on the detail of 
the lifetime distribution of the disease, the chance of a 
particular node catching the disease does not. Two differ¬ 
ent diseases may have the same transmissibility T), but 
one may be more dangerous than the other by virtue of 
the fact that it is more likely to cause a major outbreak. 
For fixed transmissibility, which lifetime distribution 7 (t) 
is the most dangerous? 

To maximise the chance of an outbreak occurring, the 
disease must guard against the possibility of it dying out 
in the early stages of its spread. This is achieved by 
having a deterministic infectious period 


7 ( 1 ) = <5 



1 -Ti 


(13) 


Mathematically, this follows from Jensen’s inequality. 
Moving the power of n outside the integral in equation 
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FIG. 3: Outbreak probability and final size for epidemics 
with (3 = 1 and Weibull-distributed lifetimes, on a random 
4-regular network with N = 500 nodes. Thin dark lines show 
the results of simulations for the fraction of outbreaks affect¬ 
ing more than 10% of the populations always (lower, purple) 
and the average final size of those outbreaks (upper, green). 
Fat pale lines correspond to the risk and vulnerability predic¬ 
tions of the cavity method. The dashed line shows the the¬ 
oretical lower bound for risk, computed using equation ns). 
The centre line k = 1 corresponds to Markov dynamics. 



FIG. 4: node vulnerability as a function of /3 in a random net¬ 
work of N = 10 3 nodes with degrees five and three (in 50/50 
ratio), generated by the configuration model, with Markov 
(y(t) = e _t ) disease dynamics. Grey lines show the result 
of the cavity equations, with nodes number 10 and 75 high¬ 
lighted in solid green and dashed blue, respectively. Circles 
and diamonds show the results of stochastic simulations for 
these two nodes, averaged over 10 4 samples. 


© we find that for n > 2 we have T n > {T\) n . From the 
cavity equations © we then deduce 

r f^ l ~ n (i-r.T 1 ), (14) 

leAT{j)\i 

with equality if and only if 7 (t) is a delta function. This 
result was mentioned in [25], and goes back to older work 
using percolation theory [5j. 

We may also ask how low the probability of an out¬ 
break can get when Tj is fixed. It follows immediately 
from the definition © that the sequence {T n } is positive 
and decreasing. Thus 


special case n = 1 corresponds to Markov disease dy¬ 
namics (i.e. exponential lifetime distribution). For large 
k the Weibull distribution approaches a delta function 
and risk is maximised. 

Finally, we point out that the line for risk in Figure © 
stays below the line for vulnerability. This is a general 
fact. Notice that the right-hand-side of equation (ThU) ex¬ 
actly corresponds to the form of the downstream cavity 
equations ©. We can conclude that 77 provides an up¬ 
per bound for 77 . That is to say, the probability that 
node i will cause a major outbreak if they are the source 
is always less than their chance of being infected by an 
outbreak that starts elsewhere. 



This rough bound gives the intuition that the risk is min¬ 
imal when the decay rate of {T n } is also minimal. To 
achieve this requires a lifetime distribution that is sharply 
peaked at zero and has a heavy tail. 

The Weibull family of distributions [26] describe the 
time to failure in systems depending on several compo¬ 
nents, and are a natural choice to model non-Markov dis¬ 
ease lifetime distributions (as used in Q, for example). 
In fact, they nicely illustrate the full range of behaviours 
between the upper and lower bounds given above. Fig¬ 
ure [3] shows the results of simulations of epidemics with 
Weibull-distributed lifetimes, 7 (f) = «e _t where 

the parameter n controls the shape of the distribution. 
The infection rate was held constant at (3 = 1 and the 
same random 4-regular random network was used for 
each sample. Small values of n correspond to the hazard 
rate of the disease getting smaller as it survives longer. 
This makes the decay of {T n } slower, and the outbreak 
probability approaches its theoretical minimum. The 


V. VULNERABILITY AND RISK RANKING 

We know from the previous section that the vulnera¬ 
bility of a node always exceeds its risk, but how do the 
risks and vulnerabilities of different nodes in the same 
network compare? For a given network and choice of j3 
and 7 (f), the cavity equations can be used to determine 
77 and 77 for every node, and this information may be 
used to rank the nodes according to the risk they pose to 
the network, or the risk the network poses to them. As 
we will see, these rankings depend in detail on the nature 
of the disease. 

Let us begin by examining vulnerability ranking, node 
vulnerability depends on the disease specification only 
through T \, so for simplicity we consider Markov dynam¬ 
ics, fixing 7 (f) = e _t . To generate FigureS) a single ran¬ 
dom network of N = 10 3 nodes was created using the 
configuration model, and the vulnerability of its nodes 
computed for various /3 using the cavity method. Two 
things are immediately noticeable from the plot: (i) there 
is a great deal of heterogeneity between nodes, and (ii) 
the rank order of vulnerability is not preserved as /3 is 
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varied. In particular, node 10 is more vulnerable than 
node 75 to less virulent diseases, yet is less vulnerable to 
more virulent ones. 

To help explain this counter-intuitive finding, we ex¬ 
plore the behaviour of the cavity equations near the lim¬ 
its of strongly infectious (Ti = 1) and weakly infec¬ 
tious (Tj = T c ) diseases. Differentiating the vulnerability 
equation 0 with respect to Ti, we have 

£>Vi= (v^+Tidvf) (l-Tit;™), (16) 

where d is used as short-hand for d/dTi. Now, as Tf —> 1 
we have v^' 1 —> 1 also, therefore the product above gives 
zero unless it is empty. That is, dvi —> 0 as T) —» 1 unless 
i has only one neighbour. In general, the first non-zero 
derivative of Vi at Ti = 1 is 

d^^Vi = |JV(i)|! I] (1 + duf). (17) 
ieV(i) 

Similarly, we differentiate the downstream cavity equa¬ 
tions m to find that, at Xi = 1, dv^ is only non¬ 
zero if j has exactly one other neighbour l ^ i , in 
which case dv^ = 1 + dv\^. Tracing a path away 
from i in the direction of j we must eventually, after 
L i: j steps reach a node with degree not equal to two (re¬ 
call that we ignore the possibility of cycles), whose cavity 
derivative in either case will be zero. We conclude that 

^wi«, = i^(i)i!n 

Turning attention now to the percolation transition, 

(i) 

we set Xj = T c , where we have Vi = v\ =0. Therefore, 
from (H6|), 

dvi = T ^ dv f = Y. T[ Y. T[dv i j) = ■■■■ 

j<ESS(i) jeSS(i) 

( 18 ) 

To make progress, let us suppose that there is some bulk 
network B , which we assume has no special structure and 
we have solved the cavity equations for that network. To 
find the value of dvi, we must carry out the expansion 
above for all paths p from i to the bulk B. We may write 

dv ]T T^Ovb, (19) 

p:i—>B 


where vb is the mean cavity vulnerability in the bulk. 

Bringing the above results together, we have for each 
node i a pair of expansions, 



FIG. 5: Left: node risk as a function of k, with /3 varied so as 
to hold constant Ti = 1/2. Each solid line corresponds to a 
different node, shaded according to degree (lighter shades are 
higher degree); the dashed line gives the mean risk r. Right: 
close up of relative risk Ti/f for several nodes; crossing lines 
correspond to changes in the risk ranking. 


is mainly controlled by the number of neighbours a node 
has, whereas vulnerability to weakly infectious diseases 
depends more subtly on the number and length of paths 
connecting the node to the infected bulk. The puzzling 
case of nodes 10 and 75 in Figure 0 ] is explained thus: 
node 75 has higher degree (five, compared to three), yet 
is the source of just 91 paths of length three, compared 
to 136 for node 10. We can therefore expect node 10 to 
more vulnerable close to T c , but less vulnerable for large 
Ti. 

As one might expect in light of the above, the ranking 
of nodes according to their risk is also not preserved 
under varying /3. More interestingly, risk rankings are 
additionally sensitive to the memory characteristics of 
the disease. Figure [5] shows the result of the cavity equa¬ 
tions in a random network with maximum degree five, 
for Weibull distributed infectious periods with k varying 
across two orders of magnitude. This time /? has also 
been varied simultaneously in order to keep T\ fixed. Al¬ 
though node risk is broadly correlated with degree, this 
is not a hard rule: there is considerable variation, and 
many changes to the risk ranking occur as n is varied 
(even though node vulnerability does not change since 
Ti is held constant). Curiously, Markov dynamics n = 1 
appear to represent an extreme case, with most nodes 
experiencing either their highest or lowest relative risk 
at that point. 


Wi wl-(l-Ti)I^WI n Lij , 

ieN(i) (20) 

Vi « (Ti — T c ) J2 T^dv B . 

The qualitative insight provided by these equations is 
the following: vulnerability to highly infectious diseases 


VI. CONCLUSION 

In the work presented above we have seen how the 
cavity method can be used to calculate measures of vul¬ 
nerability and risk in network epidemic models. The vul¬ 
nerability of a node to an ongoing outbreak is recovered 
from the solution of the downstream cavity equations ( 0 , 
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while the risk of an outbreak occurring with a given start¬ 
ing node is found by the upstream cavity equations ©• 
Node vulnerability was found to be independent of the 
details of the disease lifetime distribution, however, some 
nodes are more vulnerable than others to weaker infec¬ 
tions, yet less vulnerable to stronger ones. This fact can 
be understood by an analysis of cavity equations with pa¬ 
rameters in the neighbourhood of percolation and com¬ 
plete infection. 

The story for node risk is even more complex, as it 
depends on the memory characteristics of the disease 
in a non-trivial way. In particular, we saw how risk is 
maximised by diseases with deterministic lifetime distri¬ 
butions. This result suggests that real-world diseases, 
which are subject to evolutionary pressure to optimise 
their chance of spreading, should have infectious periods 
that are much less variable than the equivalent exponen¬ 
tial distribution, and indeed this appears to be the case 
Pj . This dependence on disease memory also carries 
over to the node risk rankings, as seen in Figure [5] 

The main drawback of the method is, of course, the 
reliance on a tree approximation. Although the method 
has performed well for all the networks considered in this 
article (despite most of them containing many cycles), 
accurate predictions cannot be made for networks with 
specific structure that favours the existence of cliques or 
short cycles. This is a common problem in the study 
of epidemics on networks, which is usually tackled via 


‘moment closure’ methods for differential equations [Til 
[p . Indeed an explicit link has been made between the 
cavity method and a particular moment closure scheme 

m- 

Looking to the future, three interesting questions re¬ 
main unanswered: (i) why do node risk rankings depend 
on the memory properties of the disease? (ii) why do 
Markov dynamics extremise relative risk? (iii) how might 
the results described here be modified in networks with 
high local clustering? Beyond these problems, several 
straightforward extensions to the work described here are 
possible. One may choose to study non-Markov infection 
rates, although this only has the effect of changing the 
definition of T n . More interestingly, fully time-dependent 
cavity equations can be derived to give a more detailed 
view of the disease progression (as was originally done for 
vulnerability in Q), and recent work has shown how the 
method may be used to infer the origin of an outbreak 
[l5j |. For added realism in applications, further hetero¬ 
geneity can be introduced by allowing /3 and 7 to vary 
across the network. More generally, the cavity method 
is widely applicable to other models of cascades on net¬ 
works, for example, merne spread in social networks or 
systemic risk in financial markets. 
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